Landscape of solutions in constraint satisfaction problems 



in 
o 
o 

(N 
> 

o 

(N 



CZ2 



X3 

o 

(N 
> 

in 

o 
in 
o 



O 

o 



X 



Marc Mezard,^ Matteo Palassini,^^^ and Olivier Rivoire^ 

' Laboratoire de Physique Theorique et Modeles Statistiques, Universite Paris-Sud, F-91405 Orsay, France. 
^ Departament de Fisica Fonamental, Universitat de Barcelona, Diagonal 647, E-08028 Barcelona, Spain. 

(Dated: February 2, 2008) 

We present a theoretical framework for characterizing the geometrical properties of the space of 
solutions in constraint satisfaction problems, together with practical algorithms for studying this 
structure on particular instances. We apply our method to the coloring problem, for which we obtain 
the total number of solutions and analyze in detail the distribution of distances between solutions. 
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Constraint satisfaction problems (CSPs) offer a unified 
language describing many complex systems. Originally 
investigated by computer scientists in relation with al- 
gorithmic complexity 0], CSPs have recently attracted 
much interest within the physics community, following 
the discovery of their close ties with spin-glass the- 
ory ^ 01 . They are currently used to tackle systems as 
diverse as, among others, error-correcting codes M, rigid- 
ity models 5], and regulatory genetic networks [6j- The 
ubiquity of CSPs stems from their general nature: given 
a set of N discrete variables subject to M constraints, a 
CSP consists in deciding whether there are assignments 
of the variables satisfying all the constraints. Of spe- 
cial interest is the class of NP-complete problems , for 
which no algorithm is known that guarantees to decide 
the satisfiability of a problem instance in a time poly- 
nomial in N . A well-studied example is the q-coloring 
problem (g-COL): given a graph with N nodes and M 
edges connecting certain pairs of nodes, and given q col- 
ors, can we color the nodes so that no two connected 
nodes share a common color? 

Much insight into CSPs is gained by focusing on typical 
instances drawn from an ensemble with a fixed density of 
constraints a = M/N. As a is varied, a threshold phe- 
nomenon is generically observed. Below a critical value 
ac, instances are typically satisfiable (SAT phase): at 
least one satisfying assignment (or solution) exists with 
probability one when ^ oo; above ac they are typi- 
cally unsatisfiable (UNSAT phase). Rigorous bounds on 
Q!c have been derived ■ The running time of algorithms 
often increases greatly near ac 

CSPs enter the standard framework of statistical 
physics by associating to each assignment of the N vari- 
ables a = {<Ji}^^ an energy E[a] defined as the number 
of constraints violated by a. The satisfiability problem 
reduces to the determination of the ground-state energy 
Eo = min<^£'[o-]: if Eq > the instance is UNSAT, 
if £'o = it is SAT. In recent yeare, several methods 
borrowed from statistical physics jj, Mt have pointed 
to the existence of a second threshold ad < ac, associ- 
ated with clustering of the space S of all solutions. For 
a < ad (Easy-SAT phase), S is typically connected: any 



two solutions are joined by a path of moves involving 
a finite number of variables. For ad < a < ac (Hard- 
SAT phase), S is typically disconnected: solutions gather 
into clusters far apart from each other (as illustrated in 
Fig. n^), which can only be joined by moves involving a 
finite fraction of the variables. This scenario, which has 
been confirmed rigorously in some cases jlll). suggests 
that computational hardness may be caused by the trap- 
ping of local algorithms in metastable clusters, which are 
exponentially more numerous than clusters of solutions. 

In this Letter, we introduce methods to analyze in de- 
tail the structure of the solution space of CSPs in the 
Hard-SAT phase. The first aspect we analyze is the en- 
tropic structure. A cluster A typically contains an ex- 
ponential number of solutions, Ai\ x exp(A^SA), where 
Nsx is the internal entropy of A (we write on ^ i>N 
when InoAr/lnfeAT ^ 1 as — > oo). We introduce 
the entropic complexity Ss(s) that counts the number 
AfNis) X exp[A^Ss(s)] of clusters with internal entropy 
Ns, and a method for computing Es(s) and the total 
entropy density stot, yielding the total number of solu- 
tions, |iS| X exp(A'^stot), for individual instances of any 
CSP. The problem of counting the number of solutions of 
a CSP is in general #P-complete a class of prob- 

lems even harder than NP-complete Estimatin g \S \ 
is important in applications such as graph reliability |l2l | , 
and computing partition functions. 

A second, related aspect of the structure of S is its 
geometry. We introduce a method to compute the geo- 
metric complexity l^did), which counts the number of 
clusters at a given distance Nd from a reference assign- 
ment (see Fig. ^), and the related weight enumerator 
function, of direct interest in coding theory jl^ . Finally, 
we indicate several generalizations of these methods. 

Our methods are based on extensions of the "ener- 
getic" cavity method (CM) of Ref.El We illustrate them 
for g-COL and show numerical results for g = 3, but we 
emphasize that any CSP can be studied along the same 
lines. The energy function associated to g-COL is that of 
the antiferromagnetic Potts model, E[a] = j) ^o-i,<Tj, 
where at £ {I, . . . ,q} and the sum is over the M graph 
edges. We study Erdos-Renyi random graphs con- 
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FIG. 1: (a) Illustration of the clustering phenomenon. In the 
Easy-SAT phase a < ad, all the solutions are connected. In 
the Hard-SAT phase ad < a < Uc, solutions separate into 
distinct clusters, (b) Notations used in the cavity approach: 
the message (cavity field) ipa^^^ gives the probability that 
node i has color a in the absence of node j. 



structed by connecting any pair of nodes with probability 
2a/N. For large N this gives M — aN and a Poisson- 
distributed connectivity with mean 2a. 

In the unclustered phase ( "rephca symmetric" phase in 
the language of spin-glass theory) the zero-temperature 
energetic CM [13 computes the ground state energy 
recursively by adding one node at a time. For large 
enough a, the recursion no longer admits a unique solu- 
tions and is generalized, via the one-step replica symme- 
try breaking Ansatz (IRSB), to a distributional recur- 
sion which can be solved self-consistently, yielding the 
"energetic" complexity T,g(e), which counts the number 
J^Ni^) ^ exp[AfI]e(e)] of clusters of local minima with 
energy E — Ne. In particular, Se(0) is found positive in 

an interval a G [ay , ad ■ The method was applied to q- 
COL in Refs.IHIil that report a^/^ ~ 2.21, ^ 2.34 
for g 3 (see also Fig. 0). The validity of the IRSB 
Ansatz in an interval \a m , aq p] containing ac was estab- 
lished for q-COL in Ref. M, using the stability analysis 
of Ref. US with am ^ 2.25, asp =i 2.50 for q = 3. 

Counting solutions - The energetic CM has the 
virtue of being simple enough, and it thus allows a precise 
determination of Uc, and the development of a powerful 
new class of algorithms {survey propagation [lOj). This 
simplicity is obtained because one focuses only on clusters 
in which some of the variables are frozen, i.e. constrained 
to adopt a unique color. Computing the entropy requires 
a more detailed information, and thus a different formal- 
ism, as first identified in Ref. 9 within the replica frame- 
work. Our approach to computing entropies is illustrated 
in Fig.^. The basic quantity we consider is the number 
Z^. of solutions for the "cavity" graph obtained from 
the original graph by removing node j, when the color 
of node i is fixed to (7^. In the unclustered phase, due 
to the locally tree-like structure of large random graphs, 
the quantities , with k denoting any of the nodes 

connected to i except j (in symbols, k G i — j), are inde- 
pendent of each other for large N. Hence a recursion rela- 



tion holds. 



By defining 



a cavity field as the probability of having color a on node 



recursion relation translates to 
^(w,) 1)^^-1 -Q (i_^('^-)) (1) 

kei-j 

with Z fixed by normalization. The ensemble of these 
equations on all oriented links, known as belief propaga- 
tion equations |2 ij . has a unique solution for a < ad- In 

( f) ( f) 

general ad < aj holds, since aj refers to the onset of 

clusters with frozen variables while at ad clusters with- 
out frozen variables may also appear. It is not difficult to 
show that the total entropy of the whole graph is given 
by Nstot = E^ A^(^) - where, similarly to 

the energetic CM we need to substract the link con- 
tributions AS'(^J) = ln(l - J2r'^r^^^i{Jr"'^) from the 
node contributions AS^'^ = InT^^UkeiC^ " ^pi''^'^) to 
avoid double counting. Above ad, following the IRSB 
Ansatz 22], we assume the existence of many clusters. 
We then compute a potential (f>{x) related to the entropic 
complexity Ss (s) through 



(2) 



where a; is a Lagrange multiplier which fixes the inter- 
nal entropy and Smin, Smax are the points at which I]s(s) 
vanishes. Assuming the independence of the quantities 
^iTfe^*'* within each cluster, we introduce probability dis- 
tributions of the cavity fields p(*^^) (i/j^*^-?) ) with respect 
to the clusters, and generalize the cavity recursion to 

kei-j 



(3) 



After solving Eq. the potential is computed as 
Ncl^ix) = ^In / []dp(fc-')(^(fc-'))e-^^^'''({V'<^-"}) 

/■ ■\ n — 



(4) 



in the absence of j, ip, 



where AS'-^\ Afi-^^J) are given above. A saddle point 
evaluation of Eq. Q gives x — —ds'Ss{s). Hence, 
from (^{x) we obtain Ss(s) via the Legendre transform 
s{x) — dx4>{x), Tis{x) — 4>(x) — xs{x). We solve numeri- 
cally Eq. Q on individual graphs by representing the dis- 
tributions p(*^J) with a population of Np cavity fields on 
each oriented link. The resulting message passing algo- 
rithm is an entropic generalization of survey propagation 
|lO| . Our entropic CM provides greater information at 
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FIG. 2: Entropic complexity for three individual Erdos-Renyi 
graphs with A'' = 5000 and different values of a = M /N. Data 
obtained with a population size Np = 512 on each oriented 
link (we verified that using Np = 4096 gives a change smaller 
than the error bars). The dotted lines are obtained by a 
polynomial fit of the potential (j>{x), the symbols by direct 
computation of the derivative dx(f){x) in Eq. Q ^M- 



the price of greater computational difficulty, due to the 
continuous nature of the cavity fields. 

Figure [21 displays some of our results for 3-COL, for 
three individual graphs with N — 5000. In particular, the 
total entropy of solutions is given by Stot = '/'(I) — Smax 
where the last equality holds because, according to our 
numerical results in Fig. [3 vanishes at a; = < 1, 

with S,nax = = 2;*). 

Therefore, for 3-COL the total entropy is dominated by 
a subexponential number of giant clusters: a randomly 
chosen solution falls almost surely in one of such rare 
clusters. We also find that the fraction of frozen variables 
is finite in the interval [sminj Smax]- 

We also implemented a version of Eq. averaged 
over Erdos-Renyi graphs, by considering a population 
of links with Poisson connectivity and a population of 
cavity fields on each link. Figure 13 shows the graph av- 
erages obtained in this way for Stotj the typical internal 
entropy Styp — argmax^ ^^(s), and the typical complex- 
ity Styp — Ss(styp)[= l]e(0)], as a function of a. The 
graph-averaged complexity curves Ss('S) (not shown) re- 
semble those in Fig. [3 (Graph-to-graph fluctuations for 
N — 5000 are significant: for a = 2.3, the standard de- 
viation of Etyp is about 27% of the mean). The negative 
complexities in Fig. |21 have no direct interpretation on 
individual graphs, but for the graph-averaged case they 
are related to rare atypical graphs [23 |. 

The above formalism can be generalized to yield 
Ee^s(e, s), the complexity associated with metastable 
clusters of energy Ne > and entropy Ns, with 
Se_s(0, s) = Ss(s), by adding a second multiplier y [25j |. 
An equivalent information is contained in the finite tem- 
perature complexity S/(/; P) [l^l, where / is the free en- 
ergy and f3 the inverse temperature, based on the identity 



with / — e — s/(3 and y = (3x. The energetic CM is 
recovered for (3 ^ 00 and a; with y = B x fixed, 
which amounts to ignore all entropic effects |2g. 

Counting clusters at a given distance - We now turn 
to the geometric structure and show how the CM can be 
used to investigate inter-cluster distances. We illustrate 
this by addressing the problem of counting the number 
of clusters as a function of their distance from a fixed ref- 
erence configuration (j, which we rephrase as a new CSP, 
named dCSP, whose thermodynamics reflect the geom- 
etry of the solution space of the initial CSP. The valid 
assignments of dCSP are the solutions u £ 5 of the ini- 
tial CSP: these are configurations of zero energy, and in 
this sense dCSP concentrates on the zero temperature 
case of the original problem. But we introduce in dCSP 
a new energy function which is the Hamming distance 
from -E'd[o'] = ~ '^Q.^i)- Therefore the clus- 

ters (resp., assignments) of dCSP with energy Ed are 
the zero-energy clusters (resp., solutions) at distance Ed 
from ^ in the initial CSP. 

The optimization problem for dCSP consists in finding 
the maximal (or the minimal) distance between and a 
solution of the original problem. By applying the en- 
ergetic CM to this problem [i^l, we obtain a geometric 
complexity Y,d{d) giving the number of clusters at dis- 
tance Nd of AfN{d) X exp[iVE<j(d)]. Figure El shows 
results for 3-COL on individual graphs. Two features are 
worth noticing: i) Tid{d) becomes positive only above a 
threshold dmin, reflecting the fact that clusters are well 
separated; ii) a. plateau appears between di and ^2, re- 
flecting the flnite diameter of clusters. We have verifled 
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FIG. 3: Graph-averaged Styp, Styp, and Stot- Notice the dif- 
ferent vertical scales. Data obtained with a population of 
16000 links, and Np = 512 fields on each link. The vertical 
line shows the threshold Um — 2.255 below which the IRSB 
Ansatz is unstable figll. The straight line is the "liquid" or 
infinite-temperature solution, suq — (1 — a) ln(g) -|-aln(g— 1). 
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FIG. 4: Geometric complexity for the same individual graphs 
as in Fig. |2l Within the error bars, 'Ed{di) = Ed(d2) — 
Ss(styp). The horizontal lines are a guide to the eye. 



that the size of this plateau coincides with the typical 
diameter computed within the entropic CM |25j. 

Generalizations - The above method can be extended 
to count the number of solutions at distance Nd from 

known as the weight enumerator function Aj^(d) in 
coding theory This can be deduced from the com- 
plexity Sd,s((i, s) which gives the number of clusters 
with internal entropy Ns at distance Nd from the ref- 
erence configuration Such a complexity can be ob- 
tained by studying the dCSP with a finite value of a 
new inverse temperature (3d, which is conjugate to the 
energy En (keeping the original temperature /3~^ to 



zeroj Once T,d,sid,s) has been found, one ob- 

tains the leading behaviour of the weight enumerator as 
A]\f{d) X exp[iVmaxs(I]d_s((i, s) -I- s)]. In the same spirit, 
our analysis can be extended to metastable configura- 
tions: in order to compute the complexity J^e,d,s{^,d,s) 
counting clusters with energy Ne, entropy Ns, at dis- 
tance Nd from <j, one needs to introduce three Lagrange 
multipliers x,y,z. All the previous complexities are par- 
ticular limits of this more general framework [2^ 1 . 

Conclusions - We have presented methods to ana- 
lyze the entropic and geometric structure of the clustered 
phase in q-COL, which give access to quantities such as 
internal cluster entropies not accessible to previous meth- 
ods. Our results for 3-COL show the existence of giant, 
atypical clusters which contain the majority of solutions. 
Generalization to other CSPs such as fc-SAT, where a 
similar picture may hold, is straightforward. 

Notice that the present results were obtained within a 
IRSB ansatz, and the stability of our solution should thus 
be checked (extending the method of Ref. Efil) to assess 
whether the solution is exact or only an approximation 
to a more complicated one involving higher order RSB. 

The new information extracted with our entropic CM 
could be exploited to design new algorithms for finding 
solutions to individual instances, improving on present 



survey propagation algorithms which only use energetic 
information (lOt] , We also envision applications to infer- 
ence problems such as Bayesian belief networks jl^ . 

We thank D. Battaglia and R. Zecchina for discus- 
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der contracts HPRN-CT-2002-00319 (STIPCO) and by 
the Community's EVERGROW Integrated Project. 

Note added: the recent paper ji^ addresses similar 
questions. 
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